Wildcards


In the section Basics you have read about the different fields or a rule. Now its time to learn how to use wildcards in those fields:


What is a wildcard?

Think about some task that requires to search for different parts of a string. These strings are typically determined by delimiting characters or by other criteria which apply to each substring (e.g. only numbers).

Example:
Assume that you have full names of persons. Now you like to get the first name and the last name of each person. The only thing you know is that the first and the last name is separated by a space. So you use this information to combine it with two wildcards. Let's say that "*" is a placeholder for "any number of characters" and you have defined a rule "* *". The rule will match each name of a person with e.g.

Input: "Andreas Pardeike"
Rule = "* *"
First * = "Andreas"
Second * = "Pardeike"
Expanded rule = "Andreas" + space + "Pardeike" = "Andreas Pardeike"

These are the basics of using wildcards. You define a string that contains wildcards and other characters and Welcome will try to find a value for each wildcard so the string containing the expanded wildcards will be the same as the string to match.

To be more flexible, Welcome has different wildcards. They match only particular strings or characters and give you more control on how the input is matched against them.


Restrictions

Wildcards are constructed from special characters. If you would like to use them in some simple text part between wildcards (literal), you need to "escape" them with the escape character (\). So instead of "cgi-bin" you need to write "cgi\-bin". This applies to any of the following characters: + - * | [ ] { } ^ \


Possible wildcards

The wildcards of Welcome are designed to match URL parts. There are six different wildcards:

*
This is the most used wildcard. It matches the shortest possible string. If you don't use two * in a single expression, you don't have to worry about using this or the next wildcard. Restriction: you cannot use another wildcard right after this wildcard.

| (vertical bar)
This wildcard is similar to the previous one and it matches the longest possible string. You don't need this wildcard if you use only one * in a single expression. However, if you use two * or more, you will get in trouble as there might be more than one solution for your expression. Restriction: you cannot use another wildcard right after this wildcard.

Example: Lets have a input string of "Pardeikes Welcome Plugin"and a rule expression of "* *". There are two different solutions to this puzzle:
a) *1 = "Pardeikes", *2 = "Welcome Plugin"
b) *1 = "Pardeikes Welcome", *2 = "Plugin"
To avoid this, you can use "* |" or "| *" instead of "* *". "* |" will result in a) and "| *" will result in b)

+
This will simply match one single character.

[range]
To match only one ore more specific characters from a set of characters, you can use the range wildcard. Inside the brackets, you can specify

  • a single character e.g. "x"
  • a range of characters like e.g. "a-z" "0-9" or "A-M"
  • a negation sign (^) right after the first bracket. It will make this wildcard match only characters not in the range.

Examples:

  • [a-z] will match "andreas" or "x"
  • [^0-9] will match "34x8" but not "2397"
  • [a-z0-9+] will match any string that contains only characters, numbers and the + sign like "ab+23+"
  • [aeiou] will match any string that contains only vocals like "aaai" or "uou"
  • [^/] will match any string that does not contain a "/". This is useful to match a subfolder in a URL

{option1 option2 ... }
To match different optional strings, you can use this wildcard. Inside the brackets, you can specify any number of optional strings. If the last word in the list is NULL, the whole wildcard can be an empty match if necessary.

Examples:

  • {www ftp} will match "www" or "ftp"
  • {a e i o u NULL} will match either "a", "e", "i", "o", "u" or an empty string